Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420190110010041
Phonetics and Speech Sciences
2019 Volume.11 No. 1 p.41 ~ p.49
Visual analysis of attention-based end-to-end speech recognition
Lim Seong-Min

Goo Ja-Hyu
Kim Hoi-Rin
Abstract
An end-to-end speech recognition model consisting of a single integrated neural network model was recently proposed. The end-to-end model does not need several training steps, and its structure is easy to understand. However, it is difficult to understand how the model recognizes speech internally. In this paper, we visualized and analyzed the attention-based end-to-end model to elucidate its internal mechanisms. We compared the acoustic model of the BLSTM-HMM hybrid model with the encoder of the end-to-end model, and visualized them using t-SNE to examine the difference between neural network layers. As a result, we were able to delineate the difference between the acoustic model and the end-to-end model encoder. Additionally, we analyzed the decoder of the end-to-end model from a language model perspective. Finally, we found that improving end-to-end model decoder is necessary to yield higher performance.
KEYWORD
speech recognition, end-to-end, t-SNE, sequence-to-sequence
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)